Goto

Collaborating Authors

 similarity measure


Spectral Analysis of Representational Similarity with Limited Neurons

Neural Information Processing Systems

Understanding representational similarity between neural recordings and computational models is essential for neuroscience, yet remains challenging to measure reliably due to the constraints on the number of neurons that can be recorded simultaneously. In this work, we apply tools from Random Matrix Theory to investigate how such limitations affect similarity measures, focusing on Centered Kernel Alignment (CKA) and Canonical Correlation Analysis (CCA). We propose an analytical framework for representational similarity analysis that relates measured similarities to the spectral properties of the underlying representations. We demonstrate that neural similarities are systematically underestimated under finite neuron sampling, mainly due to eigenvector delocalization. Moreover, for power-law population spectra, we show that the number of localized eigenvectors scales as the square root of the number of recorded neurons, providing a simple rule of thumb for practitioners. To overcome sampling bias, we introduce a denoising method to infer population-level similarity, enabling accurate analysis even with small neuron samples. Theoretical predictions are validated on synthetic and real datasets, offering practical strategies for interpreting neural data under finite sampling constraints.


Value-Aware Product Recommendation by Customer Segmentation using a suitable High-Dimensional Similarity Measure

arXiv.org Machine Learning

This paper presents a novel value-aware approach to product recommendation that simultaneously addresses the high dimensionality and sparsity of user-item data while explicitly incorporating the contribution of each product and user to overall sales revenue. The proposed framework encodes revenue contributions in the user-item matrix and computes customer similarity directly on this basis using suitable distance measures. This enables the segmentation of users according to the revenue-based similarity of their purchase baskets and supports recommendations aligned with profitability objectives. We compare conventional similarity metrics with a novel alternative tailored to high-dimensional contexts and propose three recommendation strategies based on revenue share, product popularity, and expected profit generation. The effectiveness of the proposed method is validated through simulation experiments and a real-world application using the UCI Online Retail dataset.


Active clustering for labeling training data

Neural Information Processing Systems

We also algorithm family, propose as a conjecture that they reach the minimum average items and analyze their complexity. In the second model, we analyze a specific the algorithms that minimize the average number of queries required to cluster the independently following a fixed distribution. In the first model, we characterize they form is drawn uniformly, the other one where each item chooses its class items, we consider two random models for the classes: one where the set partition classes (which can be labeled cheaply at the very end of the process). Given the cheap task of answering pairwise queries, and the computer groups the items into for training data gathering where the human experts perform the comparatively to see whether they belong to the same class. Thus motivated, we propose a setting determining the correct labels is much more expensive than comparing two items most practical cases rely on humans-in-the-loop to label the data. The process of has a high impact on the performance of the learned function.







Supplement to " Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance "

Neural Information Processing Systems

Unlike distance metric learning where the subsequent tasks utilizing the estimated distance metric is the usual focus, the proposal focuses on the estimated metric characterizing the geometry structure. Despite the illustrated taxi and MNIST examples, it is still open to finding more compelling applications that target the data space geometry. Interpreting mathematical concepts such as Riemannian metric and geodesic in the context of potential application (e.g., cognition and perception research where similarity measures are common) could be inspiring. Our proposal requires sufficiently dense data, which could be demanding, especially for high-dimensional data due to the curse of dimensionality. Dimensional reduction (e.g., manifold embedding as in the MNIST example) can substantially alleviate the curse of dimensionality, and the dense data requirement will more likely hold true.